Overview

Dataset statistics

Number of variables11
Number of observations800
Missing cells386
Missing cells (%)4.4%
Duplicate rows1
Duplicate rows (%)0.1%
Total size in memory63.4 KiB
Average record size in memory81.2 B

Variable types

Categorical2
Numeric8
Boolean1

Alerts

Dataset has 1 (0.1%) duplicate rowsDuplicates
Total is highly correlated with HP and 5 other fieldsHigh correlation
HP is highly correlated with Total and 1 other fieldsHigh correlation
Attack is highly correlated with Total and 2 other fieldsHigh correlation
Defense is highly correlated with Total and 2 other fieldsHigh correlation
Sp. Atk is highly correlated with Total and 1 other fieldsHigh correlation
Sp. Def is highly correlated with Total and 2 other fieldsHigh correlation
Speed is highly correlated with TotalHigh correlation
Total is highly correlated with HP and 6 other fieldsHigh correlation
HP is highly correlated with TotalHigh correlation
Attack is highly correlated with TotalHigh correlation
Defense is highly correlated with Total and 1 other fieldsHigh correlation
Sp. Atk is highly correlated with Total and 1 other fieldsHigh correlation
Sp. Def is highly correlated with Total and 2 other fieldsHigh correlation
Speed is highly correlated with TotalHigh correlation
Legendary is highly correlated with TotalHigh correlation
Total is highly correlated with HP and 4 other fieldsHigh correlation
HP is highly correlated with TotalHigh correlation
Attack is highly correlated with TotalHigh correlation
Defense is highly correlated with TotalHigh correlation
Sp. Atk is highly correlated with TotalHigh correlation
Sp. Def is highly correlated with TotalHigh correlation
Type 1 is highly correlated with Type 2High correlation
Type 2 is highly correlated with Type 1 and 1 other fieldsHigh correlation
Total is highly correlated with HP and 6 other fieldsHigh correlation
HP is highly correlated with Total and 2 other fieldsHigh correlation
Attack is highly correlated with Total and 2 other fieldsHigh correlation
Defense is highly correlated with Total and 3 other fieldsHigh correlation
Sp. Atk is highly correlated with Total and 3 other fieldsHigh correlation
Sp. Def is highly correlated with Total and 2 other fieldsHigh correlation
Speed is highly correlated with Total and 1 other fieldsHigh correlation
Generation is highly correlated with Type 2High correlation
Legendary is highly correlated with Total and 1 other fieldsHigh correlation
Type 2 has 386 (48.2%) missing values Missing

Reproduction

Analysis started2022-07-24 03:01:24.604032
Analysis finished2022-07-24 03:01:37.215826
Duration12.61 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Type 1
Categorical

HIGH CORRELATION

Distinct18
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
Water
112 
Normal
98 
Grass
70 
Bug
69 
Psychic
57 
Other values (13)
394 

Length

Max length8
Median length5
Mean length5.26
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGrass
2nd rowGrass
3rd rowGrass
4th rowGrass
5th rowFire

Common Values

ValueCountFrequency (%)
Water112
14.0%
Normal98
12.2%
Grass70
 
8.8%
Bug69
 
8.6%
Psychic57
 
7.1%
Fire52
 
6.5%
Rock44
 
5.5%
Electric44
 
5.5%
Dragon32
 
4.0%
Ghost32
 
4.0%
Other values (8)190
23.8%

Length

2022-07-24T00:01:37.348515image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
water112
14.0%
normal98
12.2%
grass70
 
8.8%
bug69
 
8.6%
psychic57
 
7.1%
fire52
 
6.5%
electric44
 
5.5%
rock44
 
5.5%
dragon32
 
4.0%
ghost32
 
4.0%
Other values (8)190
23.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Type 2
Categorical

HIGH CORRELATION
MISSING

Distinct18
Distinct (%)4.3%
Missing386
Missing (%)48.2%
Memory size6.4 KiB
Flying
97 
Ground
35 
Poison
34 
Psychic
33 
Fighting
26 
Other values (13)
189 

Length

Max length8
Median length6
Mean length5.652173913
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPoison
2nd rowPoison
3rd rowPoison
4th rowPoison
5th rowFlying

Common Values

ValueCountFrequency (%)
Flying97
 
12.1%
Ground35
 
4.4%
Poison34
 
4.2%
Psychic33
 
4.1%
Fighting26
 
3.2%
Grass25
 
3.1%
Fairy23
 
2.9%
Steel22
 
2.8%
Dark20
 
2.5%
Dragon18
 
2.2%
Other values (8)81
 
10.1%
(Missing)386
48.2%

Length

2022-07-24T00:01:37.469655image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
flying97
23.4%
ground35
 
8.5%
poison34
 
8.2%
psychic33
 
8.0%
fighting26
 
6.3%
grass25
 
6.0%
fairy23
 
5.6%
steel22
 
5.3%
dark20
 
4.8%
dragon18
 
4.3%
Other values (8)81
19.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Total
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct200
Distinct (%)25.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean435.1025
Minimum180
Maximum780
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2022-07-24T00:01:37.584938image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum180
5-th percentile250
Q1330
median450
Q3515
95-th percentile630
Maximum780
Range600
Interquartile range (IQR)185

Descriptive statistics

Standard deviation119.9630398
Coefficient of variation (CV)0.2757121362
Kurtosis-0.5074607103
Mean435.1025
Median Absolute Deviation (MAD)85
Skewness0.1525299234
Sum348082
Variance14391.13091
MonotonicityNot monotonic
2022-07-24T00:01:37.714451image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
60037
 
4.6%
40526
 
3.2%
50023
 
2.9%
58023
 
2.9%
30019
 
2.4%
49018
 
2.2%
52516
 
2.0%
48015
 
1.9%
49515
 
1.9%
33015
 
1.9%
Other values (190)593
74.1%
ValueCountFrequency (%)
1801
 
0.1%
1901
 
0.1%
1941
 
0.1%
1953
0.4%
1981
 
0.1%
2003
0.4%
2055
0.6%
2103
0.4%
2131
 
0.1%
2151
 
0.1%
ValueCountFrequency (%)
7803
 
0.4%
7702
 
0.2%
7201
 
0.1%
7009
1.1%
68013
1.6%
6704
 
0.5%
6601
 
0.1%
6401
 
0.1%
6351
 
0.1%
6342
 
0.2%

HP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct94
Distinct (%)11.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.25875
Minimum1
Maximum255
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2022-07-24T00:01:37.850278image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile35.95
Q150
median65
Q380
95-th percentile110
Maximum255
Range254
Interquartile range (IQR)30

Descriptive statistics

Standard deviation25.53466903
Coefficient of variation (CV)0.368685098
Kurtosis7.232078374
Mean69.25875
Median Absolute Deviation (MAD)15
Skewness1.568224376
Sum55407
Variance652.0193226
MonotonicityNot monotonic
2022-07-24T00:01:37.986304image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6067
 
8.4%
5063
 
7.9%
7057
 
7.1%
6546
 
5.8%
7543
 
5.4%
8043
 
5.4%
4038
 
4.8%
4538
 
4.8%
5537
 
4.6%
10032
 
4.0%
Other values (84)336
42.0%
ValueCountFrequency (%)
11
 
0.1%
101
 
0.1%
206
 
0.8%
252
 
0.2%
281
 
0.1%
3013
1.6%
311
 
0.1%
3515
1.9%
361
 
0.1%
371
 
0.1%
ValueCountFrequency (%)
2551
 
0.1%
2501
 
0.1%
1901
 
0.1%
1701
 
0.1%
1651
 
0.1%
1601
 
0.1%
1504
0.5%
1441
 
0.1%
1401
 
0.1%
1351
 
0.1%

Attack
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct111
Distinct (%)13.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.00125
Minimum5
Maximum190
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2022-07-24T00:01:38.120939image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile30
Q155
median75
Q3100
95-th percentile136.2
Maximum190
Range185
Interquartile range (IQR)45

Descriptive statistics

Standard deviation32.45736587
Coefficient of variation (CV)0.4108462318
Kurtosis0.1697173149
Mean79.00125
Median Absolute Deviation (MAD)20
Skewness0.551613748
Sum63201
Variance1053.480599
MonotonicityNot monotonic
2022-07-24T00:01:38.313636image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10040
 
5.0%
6539
 
4.9%
8037
 
4.6%
5037
 
4.6%
8533
 
4.1%
6033
 
4.1%
7532
 
4.0%
7031
 
3.9%
9030
 
3.8%
5530
 
3.8%
Other values (101)458
57.2%
ValueCountFrequency (%)
52
 
0.2%
103
 
0.4%
151
 
0.1%
208
1.0%
221
 
0.1%
231
 
0.1%
241
 
0.1%
257
0.9%
271
 
0.1%
291
 
0.1%
ValueCountFrequency (%)
1901
 
0.1%
1851
 
0.1%
1803
 
0.4%
1702
 
0.2%
1653
 
0.4%
1641
 
0.1%
1605
0.6%
1552
 
0.2%
15011
1.4%
1471
 
0.1%

Defense
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct103
Distinct (%)12.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73.8425
Minimum5
Maximum230
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2022-07-24T00:01:38.550377image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile35
Q150
median70
Q390
95-th percentile130
Maximum230
Range225
Interquartile range (IQR)40

Descriptive statistics

Standard deviation31.18350056
Coefficient of variation (CV)0.422297465
Kurtosis2.72626036
Mean73.8425
Median Absolute Deviation (MAD)20
Skewness1.155912303
Sum59074
Variance972.4107071
MonotonicityNot monotonic
2022-07-24T00:01:38.776209image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7054
 
6.8%
5049
 
6.1%
6046
 
5.8%
8039
 
4.9%
4036
 
4.5%
6536
 
4.5%
9035
 
4.4%
10033
 
4.1%
5532
 
4.0%
4532
 
4.0%
Other values (93)408
51.0%
ValueCountFrequency (%)
52
 
0.2%
101
 
0.1%
154
 
0.5%
204
 
0.5%
231
 
0.1%
252
 
0.2%
281
 
0.1%
3014
1.8%
322
 
0.2%
331
 
0.1%
ValueCountFrequency (%)
2303
0.4%
2002
 
0.2%
1841
 
0.1%
1803
0.4%
1681
 
0.1%
1603
0.4%
1507
0.9%
1452
 
0.2%
1406
0.8%
1352
 
0.2%

Sp. Atk
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct105
Distinct (%)13.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.82
Minimum10
Maximum194
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2022-07-24T00:01:39.001075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile30
Q149.75
median65
Q395
95-th percentile131.05
Maximum194
Range184
Interquartile range (IQR)45.25

Descriptive statistics

Standard deviation32.72229417
Coefficient of variation (CV)0.4493586126
Kurtosis0.2978936607
Mean72.82
Median Absolute Deviation (MAD)20
Skewness0.7446624978
Sum58256
Variance1070.748536
MonotonicityNot monotonic
2022-07-24T00:01:39.232752image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6051
 
6.4%
4049
 
6.1%
6544
 
5.5%
5039
 
4.9%
5535
 
4.4%
4533
 
4.1%
7030
 
3.8%
3529
 
3.6%
8527
 
3.4%
8027
 
3.4%
Other values (95)436
54.5%
ValueCountFrequency (%)
103
 
0.4%
154
 
0.5%
208
 
1.0%
231
 
0.1%
242
 
0.2%
2511
1.4%
272
 
0.2%
291
 
0.1%
3024
3.0%
311
 
0.1%
ValueCountFrequency (%)
1941
 
0.1%
1803
 
0.4%
1751
 
0.1%
1703
 
0.4%
1652
 
0.2%
1602
 
0.2%
1591
 
0.1%
1542
 
0.2%
1509
1.1%
1454
0.5%

Sp. Def
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct92
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71.9025
Minimum20
Maximum230
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2022-07-24T00:01:39.469561image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile32.95
Q150
median70
Q390
95-th percentile120
Maximum230
Range210
Interquartile range (IQR)40

Descriptive statistics

Standard deviation27.8289158
Coefficient of variation (CV)0.3870368318
Kurtosis1.628394057
Mean71.9025
Median Absolute Deviation (MAD)20
Skewness0.8540186115
Sum57522
Variance774.4485544
MonotonicityNot monotonic
2022-07-24T00:01:39.685245image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8052
 
6.5%
5050
 
6.2%
5547
 
5.9%
6544
 
5.5%
6043
 
5.4%
7540
 
5.0%
7040
 
5.0%
9036
 
4.5%
4535
 
4.4%
8530
 
3.8%
Other values (82)383
47.9%
ValueCountFrequency (%)
206
 
0.8%
231
 
0.1%
2511
1.4%
3020
2.5%
311
 
0.1%
321
 
0.1%
331
 
0.1%
341
 
0.1%
3518
2.2%
361
 
0.1%
ValueCountFrequency (%)
2301
 
0.1%
2001
 
0.1%
1602
 
0.2%
1543
 
0.4%
1507
0.9%
1402
 
0.2%
1381
 
0.1%
1354
0.5%
1309
1.1%
1291
 
0.1%

Speed
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct108
Distinct (%)13.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68.2775
Minimum5
Maximum180
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2022-07-24T00:01:39.905775image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile25
Q145
median65
Q390
95-th percentile115
Maximum180
Range175
Interquartile range (IQR)45

Descriptive statistics

Standard deviation29.06047372
Coefficient of variation (CV)0.4256229903
Kurtosis-0.2364366728
Mean68.2775
Median Absolute Deviation (MAD)21
Skewness0.3579332951
Sum54622
Variance844.5111327
MonotonicityNot monotonic
2022-07-24T00:01:40.133250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5046
 
5.8%
6044
 
5.5%
7037
 
4.6%
6536
 
4.5%
3035
 
4.4%
8033
 
4.1%
4032
 
4.0%
9031
 
3.9%
10031
 
3.9%
5530
 
3.8%
Other values (98)445
55.6%
ValueCountFrequency (%)
52
 
0.2%
103
 
0.4%
159
1.1%
2015
1.9%
221
 
0.1%
234
 
0.5%
241
 
0.1%
2510
1.2%
284
 
0.5%
293
 
0.4%
ValueCountFrequency (%)
1801
 
0.1%
1601
 
0.1%
1504
0.5%
1453
0.4%
1402
 
0.2%
1352
 
0.2%
1306
0.8%
1281
 
0.1%
1271
 
0.1%
1261
 
0.1%

Generation
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.32375
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2022-07-24T00:01:40.565795image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.6612904
Coefficient of variation (CV)0.4998241145
Kurtosis-1.239575758
Mean3.32375
Median Absolute Deviation (MAD)2
Skewness0.01425810028
Sum2659
Variance2.759885795
MonotonicityIncreasing
2022-07-24T00:01:40.725100image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1166
20.8%
5165
20.6%
3160
20.0%
4121
15.1%
2106
13.2%
682
10.2%
ValueCountFrequency (%)
1166
20.8%
2106
13.2%
3160
20.0%
4121
15.1%
5165
20.6%
682
10.2%
ValueCountFrequency (%)
682
10.2%
5165
20.6%
4121
15.1%
3160
20.0%
2106
13.2%
1166
20.8%

Legendary
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size928.0 B
False
735 
True
 
65
ValueCountFrequency (%)
False735
91.9%
True65
 
8.1%
2022-07-24T00:01:40.848877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Interactions

2022-07-24T00:01:35.582964image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:25.153107image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:26.972082image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:28.527380image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:29.650030image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:30.905673image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:32.371487image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:33.841082image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:35.721581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:25.330957image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:27.167170image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:28.664253image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:29.786706image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:31.103971image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:32.568497image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:34.007906image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:35.859314image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:25.526728image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:27.359508image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:28.819130image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:29.917092image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:31.250861image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:32.760249image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:34.427414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:35.993193image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:25.739033image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:27.561750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:28.954660image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:30.064816image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:31.382214image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:32.965657image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:34.633622image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:36.130401image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:25.934753image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:27.749591image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:29.081586image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:30.196165image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:31.526062image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:33.151831image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:34.830335image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:36.277668image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:26.390671image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:27.950881image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:29.223607image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:30.332644image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:31.728051image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:33.318527image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:35.033780image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:36.408184image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:26.574755image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:28.137361image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:29.364393image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:30.514359image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:31.930105image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:33.486055image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:35.229005image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:36.557420image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:26.778159image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:28.354564image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:29.511306image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:30.712079image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:32.147001image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:33.681741image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-24T00:01:35.442759image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-07-24T00:01:40.948673image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-24T00:01:41.183573image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-24T00:01:41.411366image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-24T00:01:41.623718image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-07-24T00:01:41.816092image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-24T00:01:36.778795image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-24T00:01:37.000107image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-24T00:01:37.103317image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Type 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
0GrassPoison3184549496565451False
1GrassPoison4056062638080601False
2GrassPoison525808283100100801False
3GrassPoison62580100123122120801False
4FireNaN3093952436050651False
5FireNaN4055864588065801False
6FireFlying534788478109851001False
7FireDragon63478130111130851001False
8FireFlying63478104781591151001False
9WaterNaN3144448655064431False

Last rows

Type 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
790FlyingDragon2454030354540556False
791FlyingDragon53585708097801236False
792FairyNaN6801261319513198996True
793DarkFlying6801261319513198996True
794DragonGround6001081001218195956True
795RockFairy60050100150100150506True
796RockFairy700501601101601101106True
797PsychicGhost6008011060150130706True
798PsychicDark6808016060170130806True
799FireWater6008011012013090706True

Duplicate rows

Most frequently occurring

Type 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary# duplicates
0WaterFighting580917290129901085False2